In this notebook we will map kmers to a square grid.

Later this will be used to map kmer counts in this grid, generating images that can be used for training a neural network.

We will use tSNE to place kmers in a 2D space so that more similar kmers are close to each other. To identify each kmer uniquely, we will count, for each one, all sub-kmers with primer number length, up to the first such kmer with lenght above half of our focal sequence. These counts uniquely identify each kmer, and we will use these as properties in tSNE to map them.

Finally, we will rotate tSNE results so that the axis of AT richness goes left to right. This should not affect a neural network but is nice for we humans to visualize.

We will then fit these coordinates to a grid so that each kmer occupies only one grid cell.

We finally export this grid information to use in python to generate figures from kmer counts.

Let’s start by setting the kmer length and number of available cores for computation.

rm(list = ls())
kmer_len = 9
ncores = 32

First, read packages and setup parallel computing:

library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
  method         from
  print.tbl_lazy     
  print.tbl_sql      
Failed to create bus connection: No such file or directory
Warning in system("timedatectl", intern = TRUE) :
  running command 'timedatectl' had status 1
── Attaching packages ───────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
✓ ggplot2 3.3.3     ✓ purrr   0.3.4
✓ tibble  3.1.0     ✓ dplyr   1.0.5
✓ tidyr   1.1.3     ✓ stringr 1.4.0
✓ readr   1.4.0     ✓ forcats 0.5.1
── Conflicts ──────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
library(kmer)
library(Rtsne)
library(furrr)
Loading required package: future

Attaching package: ‘future’

The following object is masked from ‘package:kmer’:

    cluster
library(acss)
Loading required package: acss.data
plan(multisession, workers = ncores)

Listing canonical kmers

Let’s start by listing all possible kmers.

all_kmers = expand.grid(rep(list(1:4),kmer_len))

Let’s now reduce this list only to the canonical kmers. For that, let’s create a function that retrieves the reverse complement

revcomp = function(vec){
  comp_dict = 4:1
  names(comp_dict) = 1:4
  
  vec = rev(vec)
  vec = comp_dict[vec]
  
  names(vec) = NULL
  
  return(vec)
}

Now, let’s create a function that returns the canonical version of any kmer:

get_canonical= function(vec){
  str_vec = str_c(vec,collapse='')
  str_rev = str_c(revcomp(vec),collapse='')
  
  canonical = sort(c(str_vec,str_rev))[1]
  
  return(unlist(str_split(canonical,'')))
}

Finally, let’s apply this to the matrix with all kmers and remove duplicates:

all_canonical = apply(all_kmers,1,get_canonical) %>%
  t() %>%
  as.data.frame() %>%
  distinct()

all_canonical

Counting sub-kmers

We need to score each kmer for a bunch of properties, so we can use tSNE later to order them in 2D. Here we will use counts of contained 2mers, 3mers and 5mers and so on, until we reach a size that is more than half of the original kmer. With this, counts for each kmer will be unique. The fastest way to do that is to transform everything to DNAbin format first.



to_DNAbin = function(x) {
  to_char = c('A','C','G','T')
  ape::as.DNAbin(str_c(to_char[as.integer(unlist(x))]))
}


all_DNAbins = all_canonical %>%
  split(f = 1:nrow(.)) %>%
  furrr:::future_map(to_DNAbin) %>%
  tibble(forward = .) %>%
  rowwise %>%
  mutate(rc = list(ape::complement(forward))) %>%
  ungroup
  
all_DNAbins 
NA

Now let’s list all sizes for sub-kmers that we will count:

sub_lens = 2
past_half = FALSE

for (i in 3:(kmer_len)){
  if (i > kmer_len/2){past_half = TRUE}
  
  if (all(as.logical(i %% sub_lens))){
    sub_lens = c(sub_lens,i)
    if (past_half) {break}
  }
  
}

sub_lens
[1] 2 3 5

Now we can count sub-kmers. For each canonical kmer, we will average counts with its reverse complement.

subcounts = list()

for (subl in sub_lens){
  subcounts = append(subcounts, list(all_DNAbins %>%
  split(1:nrow(.)) %>%
  furrr::future_map_dfr(~ kmer::kcount(.x$forward,k = subl) %>% as_tibble %>%
                  bind_rows(kmer::kcount(.x$rc, k = subl) %>% as_tibble) %>%
                  summarise_all(mean), .options = furrr_options(seed = TRUE)))
  )
}
subcounts
[[1]]

[[2]]

[[3]]
save.image()

tSNE

Let’s now do the ordination with t-SNE. We will bind counts of sub-kmers and normalize counts. We will use a higher perplexity than the default since it seems to spread points better. We will also increase the number of iterations to make sure the configuration converges.

normalized = do.call(bind_cols, subcounts) %>%
  as.matrix %>%
  Rtsne::normalize_input()

tsne = Rtsne(normalized,
             dims=2,
             verbose = T, 
             normalize = F,
             num_threads=ncores, 
             perplexity=min(1500,nrow(normalized)/3-1),
             max_iter = 20000)
Performing PCA
Read the 131072 x 50 data matrix successfully!
OpenMP is working. 32 threads.
Using no_dims = 2, perplexity = 1500.000000, and theta = 0.500000
Computing input similarities...
Building tree...
 - point 10000 of 131072
 - point 20000 of 131072
 - point 30000 of 131072
 - point 40000 of 131072
 - point 50000 of 131072
 - point 60000 of 131072
 - point 70000 of 131072
 - point 80000 of 131072
 - point 90000 of 131072
 - point 100000 of 131072
 - point 110000 of 131072
 - point 120000 of 131072
 - point 130000 of 131072
Done in 6220.85 seconds (sparsity = 0.042250)!
Learning embedding...
Iteration 50: error is 82.790487 (50 iterations in 217.49 seconds)
Iteration 100: error is 82.790487 (50 iterations in 212.74 seconds)
Iteration 150: error is 82.790487 (50 iterations in 210.48 seconds)
Iteration 200: error is 82.790487 (50 iterations in 211.76 seconds)
Iteration 250: error is 82.790487 (50 iterations in 208.27 seconds)
Iteration 300: error is 4.413780 (50 iterations in 335.87 seconds)
Iteration 350: error is 3.646031 (50 iterations in 465.54 seconds)
Iteration 400: error is 3.161301 (50 iterations in 226.58 seconds)
Iteration 450: error is 2.915507 (50 iterations in 212.37 seconds)
Iteration 500: error is 2.781788 (50 iterations in 216.11 seconds)
Iteration 550: error is 2.688591 (50 iterations in 210.83 seconds)
Iteration 600: error is 2.617993 (50 iterations in 205.90 seconds)
Iteration 650: error is 2.562067 (50 iterations in 209.80 seconds)
Iteration 700: error is 2.515813 (50 iterations in 208.10 seconds)
Iteration 750: error is 2.477901 (50 iterations in 213.49 seconds)
Iteration 800: error is 2.446787 (50 iterations in 210.70 seconds)
Iteration 850: error is 2.420842 (50 iterations in 203.65 seconds)
Iteration 900: error is 2.398360 (50 iterations in 207.98 seconds)
Iteration 950: error is 2.378982 (50 iterations in 207.89 seconds)
Iteration 1000: error is 2.361883 (50 iterations in 205.89 seconds)
Iteration 1050: error is 2.346214 (50 iterations in 202.36 seconds)
Iteration 1100: error is 2.332699 (50 iterations in 209.26 seconds)
Iteration 1150: error is 2.320063 (50 iterations in 201.42 seconds)
Iteration 1200: error is 2.308959 (50 iterations in 200.78 seconds)
Iteration 1250: error is 2.298824 (50 iterations in 201.07 seconds)
Iteration 1300: error is 2.289545 (50 iterations in 201.93 seconds)
Iteration 1350: error is 2.281075 (50 iterations in 200.68 seconds)
Iteration 1400: error is 2.273332 (50 iterations in 200.58 seconds)
Iteration 1450: error is 2.266132 (50 iterations in 200.13 seconds)
Iteration 1500: error is 2.259635 (50 iterations in 204.97 seconds)
Iteration 1550: error is 2.253723 (50 iterations in 201.11 seconds)
Iteration 1600: error is 2.248299 (50 iterations in 201.59 seconds)
Iteration 1650: error is 2.243098 (50 iterations in 206.24 seconds)
Iteration 1700: error is 2.238029 (50 iterations in 206.57 seconds)
Iteration 1750: error is 2.233092 (50 iterations in 207.22 seconds)
Iteration 1800: error is 2.228613 (50 iterations in 202.85 seconds)
Iteration 1850: error is 2.224584 (50 iterations in 206.88 seconds)
Iteration 1900: error is 2.221032 (50 iterations in 203.58 seconds)
Iteration 1950: error is 2.217820 (50 iterations in 203.45 seconds)
Iteration 2000: error is 2.214697 (50 iterations in 202.79 seconds)
Iteration 2050: error is 2.211697 (50 iterations in 203.62 seconds)
Iteration 2100: error is 2.209054 (50 iterations in 201.69 seconds)
Iteration 2150: error is 2.206653 (50 iterations in 207.88 seconds)
Iteration 2200: error is 2.204421 (50 iterations in 205.72 seconds)
Iteration 2250: error is 2.202281 (50 iterations in 208.01 seconds)
Iteration 2300: error is 2.200328 (50 iterations in 204.81 seconds)
Iteration 2350: error is 2.198507 (50 iterations in 202.69 seconds)
Iteration 2400: error is 2.196793 (50 iterations in 203.13 seconds)
Iteration 2450: error is 2.195210 (50 iterations in 203.75 seconds)
Iteration 2500: error is 2.193710 (50 iterations in 202.55 seconds)
Iteration 2550: error is 2.192282 (50 iterations in 201.83 seconds)
Iteration 2600: error is 2.190940 (50 iterations in 203.65 seconds)
Iteration 2650: error is 2.189690 (50 iterations in 204.51 seconds)
Iteration 2700: error is 2.188519 (50 iterations in 202.71 seconds)
Iteration 2750: error is 2.187431 (50 iterations in 203.03 seconds)
Iteration 2800: error is 2.186397 (50 iterations in 201.27 seconds)
Iteration 2850: error is 2.185412 (50 iterations in 205.41 seconds)
Iteration 2900: error is 2.184506 (50 iterations in 203.16 seconds)
Iteration 2950: error is 2.183636 (50 iterations in 202.93 seconds)
Iteration 3000: error is 2.182852 (50 iterations in 209.84 seconds)
Iteration 3050: error is 2.182074 (50 iterations in 203.07 seconds)
Iteration 3100: error is 2.181354 (50 iterations in 205.64 seconds)
Iteration 3150: error is 2.180658 (50 iterations in 202.40 seconds)
Iteration 3200: error is 2.179981 (50 iterations in 203.97 seconds)
Iteration 3250: error is 2.179345 (50 iterations in 207.38 seconds)
Iteration 3300: error is 2.178711 (50 iterations in 205.05 seconds)
Iteration 3350: error is 2.178132 (50 iterations in 202.96 seconds)
Iteration 3400: error is 2.177586 (50 iterations in 202.81 seconds)
Iteration 3450: error is 2.177032 (50 iterations in 201.10 seconds)
Iteration 3500: error is 2.176492 (50 iterations in 204.29 seconds)
Iteration 3550: error is 2.175978 (50 iterations in 200.96 seconds)
Iteration 3600: error is 2.175473 (50 iterations in 201.02 seconds)
Iteration 3650: error is 2.174989 (50 iterations in 203.70 seconds)
Iteration 3700: error is 2.174536 (50 iterations in 203.05 seconds)
Iteration 3750: error is 2.174069 (50 iterations in 202.01 seconds)
Iteration 3800: error is 2.173615 (50 iterations in 202.52 seconds)
Iteration 3850: error is 2.173179 (50 iterations in 201.21 seconds)
Iteration 3900: error is 2.172771 (50 iterations in 200.33 seconds)
Iteration 3950: error is 2.172356 (50 iterations in 200.71 seconds)
Iteration 4000: error is 2.171933 (50 iterations in 201.86 seconds)
Iteration 4050: error is 2.171526 (50 iterations in 199.09 seconds)
Iteration 4100: error is 2.171141 (50 iterations in 203.63 seconds)
Iteration 4150: error is 2.170778 (50 iterations in 200.27 seconds)
Iteration 4200: error is 2.170408 (50 iterations in 201.61 seconds)
Iteration 4250: error is 2.170047 (50 iterations in 203.50 seconds)
Iteration 4300: error is 2.169709 (50 iterations in 201.05 seconds)
Iteration 4350: error is 2.169374 (50 iterations in 201.20 seconds)
Iteration 4400: error is 2.169038 (50 iterations in 203.32 seconds)
Iteration 4450: error is 2.168714 (50 iterations in 199.20 seconds)
Iteration 4500: error is 2.168376 (50 iterations in 202.67 seconds)
Iteration 4550: error is 2.168044 (50 iterations in 199.38 seconds)
Iteration 4600: error is 2.167717 (50 iterations in 200.43 seconds)
Iteration 4650: error is 2.167399 (50 iterations in 199.66 seconds)
Iteration 4700: error is 2.167085 (50 iterations in 200.81 seconds)
Iteration 4750: error is 2.166774 (50 iterations in 200.78 seconds)
Iteration 4800: error is 2.166494 (50 iterations in 199.67 seconds)
Iteration 4850: error is 2.166206 (50 iterations in 198.98 seconds)
Iteration 4900: error is 2.165924 (50 iterations in 201.55 seconds)
Iteration 4950: error is 2.165645 (50 iterations in 203.75 seconds)
Iteration 5000: error is 2.165365 (50 iterations in 199.88 seconds)
Iteration 5050: error is 2.165107 (50 iterations in 198.18 seconds)
Iteration 5100: error is 2.164858 (50 iterations in 200.95 seconds)
Iteration 5150: error is 2.164618 (50 iterations in 204.51 seconds)
Iteration 5200: error is 2.164397 (50 iterations in 202.49 seconds)
Iteration 5250: error is 2.164185 (50 iterations in 202.70 seconds)
Iteration 5300: error is 2.163980 (50 iterations in 199.74 seconds)
Iteration 5350: error is 2.163801 (50 iterations in 208.78 seconds)
Iteration 5400: error is 2.163618 (50 iterations in 208.27 seconds)
Iteration 5450: error is 2.163440 (50 iterations in 200.69 seconds)
Iteration 5500: error is 2.163264 (50 iterations in 207.47 seconds)
Iteration 5550: error is 2.163095 (50 iterations in 201.48 seconds)
Iteration 5600: error is 2.162919 (50 iterations in 197.79 seconds)
Iteration 5650: error is 2.162771 (50 iterations in 202.30 seconds)
Iteration 5700: error is 2.162611 (50 iterations in 201.37 seconds)
Iteration 5750: error is 2.162442 (50 iterations in 202.47 seconds)
Iteration 5800: error is 2.162290 (50 iterations in 199.86 seconds)
Iteration 5850: error is 2.162123 (50 iterations in 202.67 seconds)
Iteration 5900: error is 2.161963 (50 iterations in 199.46 seconds)
Iteration 5950: error is 2.161798 (50 iterations in 202.33 seconds)
Iteration 6000: error is 2.161640 (50 iterations in 204.76 seconds)
Iteration 6050: error is 2.161483 (50 iterations in 203.81 seconds)
Iteration 6100: error is 2.161309 (50 iterations in 202.07 seconds)
Iteration 6150: error is 2.161175 (50 iterations in 203.97 seconds)
Iteration 6200: error is 2.161009 (50 iterations in 200.67 seconds)
Iteration 6250: error is 2.160857 (50 iterations in 201.91 seconds)
Iteration 6300: error is 2.160734 (50 iterations in 199.36 seconds)
Iteration 6350: error is 2.160591 (50 iterations in 201.03 seconds)
Iteration 6400: error is 2.160432 (50 iterations in 199.02 seconds)
Iteration 6450: error is 2.160293 (50 iterations in 200.84 seconds)
Iteration 6500: error is 2.160122 (50 iterations in 200.51 seconds)
Iteration 6550: error is 2.159995 (50 iterations in 201.40 seconds)
Iteration 6600: error is 2.159845 (50 iterations in 201.50 seconds)
Iteration 6650: error is 2.159697 (50 iterations in 203.09 seconds)
Iteration 6700: error is 2.159522 (50 iterations in 200.18 seconds)
Iteration 6750: error is 2.159335 (50 iterations in 204.23 seconds)
Iteration 6800: error is 2.159162 (50 iterations in 201.25 seconds)
Iteration 6850: error is 2.158987 (50 iterations in 201.82 seconds)
Iteration 6900: error is 2.158799 (50 iterations in 201.14 seconds)
Iteration 6950: error is 2.158614 (50 iterations in 204.02 seconds)
Iteration 7000: error is 2.158429 (50 iterations in 202.13 seconds)
Iteration 7050: error is 2.158223 (50 iterations in 201.30 seconds)
Iteration 7100: error is 2.158020 (50 iterations in 201.75 seconds)
Iteration 7150: error is 2.157847 (50 iterations in 200.17 seconds)
Iteration 7200: error is 2.157696 (50 iterations in 201.53 seconds)
Iteration 7250: error is 2.157528 (50 iterations in 201.55 seconds)
Iteration 7300: error is 2.157383 (50 iterations in 200.21 seconds)
Iteration 7350: error is 2.157235 (50 iterations in 204.00 seconds)
Iteration 7400: error is 2.157077 (50 iterations in 200.62 seconds)
Iteration 7450: error is 2.156935 (50 iterations in 201.87 seconds)
Iteration 7500: error is 2.156776 (50 iterations in 199.14 seconds)
Iteration 7550: error is 2.156602 (50 iterations in 200.16 seconds)
Iteration 7600: error is 2.156437 (50 iterations in 199.33 seconds)
Iteration 7650: error is 2.156249 (50 iterations in 200.44 seconds)
Iteration 7700: error is 2.156148 (50 iterations in 203.81 seconds)
Iteration 7750: error is 2.155992 (50 iterations in 200.82 seconds)
Iteration 7800: error is 2.155853 (50 iterations in 198.98 seconds)
Iteration 7850: error is 2.155674 (50 iterations in 200.42 seconds)
Iteration 7900: error is 2.155492 (50 iterations in 198.90 seconds)
Iteration 7950: error is 2.155366 (50 iterations in 200.84 seconds)
Iteration 8000: error is 2.155255 (50 iterations in 199.48 seconds)
Iteration 8050: error is 2.155171 (50 iterations in 201.47 seconds)
Iteration 8100: error is 2.155116 (50 iterations in 201.35 seconds)
Iteration 8150: error is 2.155059 (50 iterations in 203.43 seconds)
Iteration 8200: error is 2.154988 (50 iterations in 199.43 seconds)
Iteration 8250: error is 2.154928 (50 iterations in 204.69 seconds)
Iteration 8300: error is 2.154847 (50 iterations in 200.88 seconds)
Iteration 8350: error is 2.154757 (50 iterations in 204.17 seconds)
Iteration 8400: error is 2.154662 (50 iterations in 197.57 seconds)
Iteration 8450: error is 2.154566 (50 iterations in 200.14 seconds)
Iteration 8500: error is 2.154444 (50 iterations in 200.28 seconds)
Iteration 8550: error is 2.154343 (50 iterations in 201.71 seconds)
Iteration 8600: error is 2.154234 (50 iterations in 200.05 seconds)
Iteration 8650: error is 2.154117 (50 iterations in 200.50 seconds)
Iteration 8700: error is 2.154022 (50 iterations in 199.96 seconds)
Iteration 8750: error is 2.153912 (50 iterations in 200.13 seconds)
Iteration 8800: error is 2.153801 (50 iterations in 199.67 seconds)
Iteration 8850: error is 2.153682 (50 iterations in 201.24 seconds)
Iteration 8900: error is 2.153564 (50 iterations in 199.52 seconds)
Iteration 8950: error is 2.153446 (50 iterations in 200.72 seconds)
Iteration 9000: error is 2.153323 (50 iterations in 200.98 seconds)
Iteration 9050: error is 2.153203 (50 iterations in 199.71 seconds)
Iteration 9100: error is 2.153090 (50 iterations in 202.64 seconds)
Iteration 9150: error is 2.152962 (50 iterations in 199.98 seconds)
Iteration 9200: error is 2.152835 (50 iterations in 201.56 seconds)
Iteration 9250: error is 2.152723 (50 iterations in 199.90 seconds)
Iteration 9300: error is 2.152590 (50 iterations in 203.92 seconds)
Iteration 9350: error is 2.152476 (50 iterations in 200.34 seconds)
Iteration 9400: error is 2.152359 (50 iterations in 200.01 seconds)
Iteration 9450: error is 2.152247 (50 iterations in 199.59 seconds)
Iteration 9500: error is 2.152131 (50 iterations in 200.43 seconds)
Iteration 9550: error is 2.152024 (50 iterations in 199.55 seconds)
Iteration 9600: error is 2.151917 (50 iterations in 201.05 seconds)
Iteration 9650: error is 2.151790 (50 iterations in 202.13 seconds)
Iteration 9700: error is 2.151669 (50 iterations in 201.28 seconds)
Iteration 9750: error is 2.151569 (50 iterations in 205.86 seconds)
Iteration 9800: error is 2.151450 (50 iterations in 202.47 seconds)
Iteration 9850: error is 2.151323 (50 iterations in 204.15 seconds)
Iteration 9900: error is 2.151210 (50 iterations in 208.24 seconds)
Iteration 9950: error is 2.151109 (50 iterations in 206.54 seconds)
Iteration 10000: error is 2.151037 (50 iterations in 202.49 seconds)
Iteration 10050: error is 2.150958 (50 iterations in 201.21 seconds)
Iteration 10100: error is 2.150861 (50 iterations in 205.20 seconds)
Iteration 10150: error is 2.150771 (50 iterations in 199.54 seconds)
Iteration 10200: error is 2.150698 (50 iterations in 200.41 seconds)
Iteration 10250: error is 2.150609 (50 iterations in 200.81 seconds)
Iteration 10300: error is 2.150524 (50 iterations in 199.36 seconds)
Iteration 10350: error is 2.150444 (50 iterations in 201.12 seconds)
Iteration 10400: error is 2.150360 (50 iterations in 201.15 seconds)
Iteration 10450: error is 2.150294 (50 iterations in 200.10 seconds)
Iteration 10500: error is 2.150199 (50 iterations in 210.08 seconds)
Iteration 10550: error is 2.150104 (50 iterations in 202.45 seconds)
Iteration 10600: error is 2.150027 (50 iterations in 199.33 seconds)
Iteration 10650: error is 2.149931 (50 iterations in 199.74 seconds)
Iteration 10700: error is 2.149847 (50 iterations in 201.27 seconds)
Iteration 10750: error is 2.149769 (50 iterations in 200.42 seconds)
Iteration 10800: error is 2.149683 (50 iterations in 198.86 seconds)
Iteration 10850: error is 2.149599 (50 iterations in 203.34 seconds)
Iteration 10900: error is 2.149523 (50 iterations in 199.97 seconds)
Iteration 10950: error is 2.149448 (50 iterations in 201.02 seconds)
Iteration 11000: error is 2.149370 (50 iterations in 199.01 seconds)
Iteration 11050: error is 2.149289 (50 iterations in 202.12 seconds)
Iteration 11100: error is 2.149225 (50 iterations in 200.70 seconds)
Iteration 11150: error is 2.149143 (50 iterations in 200.25 seconds)
Iteration 11200: error is 2.149087 (50 iterations in 198.94 seconds)
Iteration 11250: error is 2.149019 (50 iterations in 200.59 seconds)
Iteration 11300: error is 2.148946 (50 iterations in 198.35 seconds)
Iteration 11350: error is 2.148876 (50 iterations in 200.95 seconds)
Iteration 11400: error is 2.148799 (50 iterations in 201.85 seconds)
Iteration 11450: error is 2.148714 (50 iterations in 202.78 seconds)
Iteration 11500: error is 2.148654 (50 iterations in 199.61 seconds)
Iteration 11550: error is 2.148579 (50 iterations in 202.09 seconds)
Iteration 11600: error is 2.148504 (50 iterations in 198.76 seconds)
Iteration 11650: error is 2.148433 (50 iterations in 201.91 seconds)
Iteration 11700: error is 2.148384 (50 iterations in 198.72 seconds)
Iteration 11750: error is 2.148323 (50 iterations in 201.48 seconds)
Iteration 11800: error is 2.148258 (50 iterations in 200.76 seconds)
Iteration 11850: error is 2.148199 (50 iterations in 202.42 seconds)
Iteration 11900: error is 2.148143 (50 iterations in 200.64 seconds)
Iteration 11950: error is 2.148068 (50 iterations in 201.59 seconds)
Iteration 12000: error is 2.148002 (50 iterations in 201.22 seconds)
Iteration 12050: error is 2.147927 (50 iterations in 202.53 seconds)
Iteration 12100: error is 2.147866 (50 iterations in 200.83 seconds)
Iteration 12150: error is 2.147802 (50 iterations in 205.20 seconds)
Iteration 12200: error is 2.147736 (50 iterations in 203.52 seconds)
Iteration 12250: error is 2.147678 (50 iterations in 208.69 seconds)
Iteration 12300: error is 2.147605 (50 iterations in 203.27 seconds)
Iteration 12350: error is 2.147533 (50 iterations in 199.31 seconds)
Iteration 12400: error is 2.147469 (50 iterations in 201.18 seconds)
Iteration 12450: error is 2.147396 (50 iterations in 198.18 seconds)
Iteration 12500: error is 2.147315 (50 iterations in 199.33 seconds)
Iteration 12550: error is 2.147245 (50 iterations in 198.99 seconds)
Iteration 12600: error is 2.147186 (50 iterations in 200.03 seconds)
Iteration 12650: error is 2.147122 (50 iterations in 199.51 seconds)
Iteration 12700: error is 2.147069 (50 iterations in 198.72 seconds)
Iteration 12750: error is 2.147007 (50 iterations in 199.30 seconds)
Iteration 12800: error is 2.146950 (50 iterations in 200.62 seconds)
Iteration 12850: error is 2.146890 (50 iterations in 198.77 seconds)
Iteration 12900: error is 2.146845 (50 iterations in 199.47 seconds)
Iteration 12950: error is 2.146796 (50 iterations in 203.13 seconds)
Iteration 13000: error is 2.146742 (50 iterations in 200.73 seconds)
Iteration 13050: error is 2.146704 (50 iterations in 201.32 seconds)
Iteration 13100: error is 2.146670 (50 iterations in 201.52 seconds)
Iteration 13150: error is 2.146611 (50 iterations in 202.52 seconds)
Iteration 13200: error is 2.146556 (50 iterations in 201.90 seconds)
Iteration 13250: error is 2.146519 (50 iterations in 200.60 seconds)
Iteration 13300: error is 2.146485 (50 iterations in 199.29 seconds)
Iteration 13350: error is 2.146426 (50 iterations in 201.99 seconds)
Iteration 13400: error is 2.146386 (50 iterations in 201.32 seconds)
Iteration 13450: error is 2.146347 (50 iterations in 203.86 seconds)
Iteration 13500: error is 2.146290 (50 iterations in 200.10 seconds)
Iteration 13550: error is 2.146276 (50 iterations in 201.97 seconds)
Iteration 13600: error is 2.146226 (50 iterations in 199.13 seconds)
Iteration 13650: error is 2.146190 (50 iterations in 202.21 seconds)
Iteration 13700: error is 2.146148 (50 iterations in 199.26 seconds)
Iteration 13750: error is 2.146118 (50 iterations in 202.34 seconds)
Iteration 13800: error is 2.146064 (50 iterations in 198.62 seconds)
Iteration 13850: error is 2.146035 (50 iterations in 201.25 seconds)
Iteration 13900: error is 2.146011 (50 iterations in 197.72 seconds)
Iteration 13950: error is 2.145974 (50 iterations in 200.42 seconds)
Iteration 14000: error is 2.145947 (50 iterations in 202.55 seconds)
Iteration 14050: error is 2.145921 (50 iterations in 202.90 seconds)
Iteration 14100: error is 2.145882 (50 iterations in 201.43 seconds)
Iteration 14150: error is 2.145840 (50 iterations in 201.70 seconds)
Iteration 14200: error is 2.145798 (50 iterations in 200.69 seconds)
Iteration 14250: error is 2.145772 (50 iterations in 200.23 seconds)
Iteration 14300: error is 2.145725 (50 iterations in 204.89 seconds)
Iteration 14350: error is 2.145700 (50 iterations in 201.38 seconds)
Iteration 14400: error is 2.145680 (50 iterations in 200.29 seconds)
Iteration 14450: error is 2.145642 (50 iterations in 204.31 seconds)
Iteration 14500: error is 2.145620 (50 iterations in 199.97 seconds)
Iteration 14550: error is 2.145608 (50 iterations in 198.16 seconds)
Iteration 14600: error is 2.145575 (50 iterations in 199.07 seconds)
Iteration 14650: error is 2.145543 (50 iterations in 199.43 seconds)
Iteration 14700: error is 2.145517 (50 iterations in 202.47 seconds)
Iteration 14750: error is 2.145492 (50 iterations in 201.86 seconds)
Iteration 14800: error is 2.145468 (50 iterations in 199.54 seconds)
Iteration 14850: error is 2.145442 (50 iterations in 201.67 seconds)
Iteration 14900: error is 2.145416 (50 iterations in 198.57 seconds)
Iteration 14950: error is 2.145375 (50 iterations in 201.24 seconds)
Iteration 15000: error is 2.145349 (50 iterations in 199.04 seconds)
Iteration 15050: error is 2.145320 (50 iterations in 201.50 seconds)
Iteration 15100: error is 2.145293 (50 iterations in 200.29 seconds)
Iteration 15150: error is 2.145264 (50 iterations in 200.66 seconds)
Iteration 15200: error is 2.145234 (50 iterations in 198.02 seconds)
Iteration 15250: error is 2.145202 (50 iterations in 200.50 seconds)
Iteration 15300: error is 2.145160 (50 iterations in 199.78 seconds)
Iteration 15350: error is 2.145126 (50 iterations in 201.38 seconds)
Iteration 15400: error is 2.145107 (50 iterations in 201.21 seconds)
Iteration 15450: error is 2.145090 (50 iterations in 199.98 seconds)
Iteration 15500: error is 2.145057 (50 iterations in 200.70 seconds)
Iteration 15550: error is 2.145022 (50 iterations in 198.84 seconds)
Iteration 15600: error is 2.144988 (50 iterations in 198.11 seconds)
Iteration 15650: error is 2.144965 (50 iterations in 202.03 seconds)
Iteration 15700: error is 2.144934 (50 iterations in 200.82 seconds)
Iteration 15750: error is 2.144915 (50 iterations in 198.63 seconds)
Iteration 15800: error is 2.144904 (50 iterations in 201.15 seconds)
Iteration 15850: error is 2.144886 (50 iterations in 199.94 seconds)
Iteration 15900: error is 2.144863 (50 iterations in 201.44 seconds)
Iteration 15950: error is 2.144841 (50 iterations in 199.20 seconds)
Iteration 16000: error is 2.144819 (50 iterations in 201.20 seconds)
Iteration 16050: error is 2.144817 (50 iterations in 200.99 seconds)
Iteration 16100: error is 2.144778 (50 iterations in 201.66 seconds)
Iteration 16150: error is 2.144754 (50 iterations in 199.93 seconds)
Iteration 16200: error is 2.144745 (50 iterations in 202.70 seconds)
Iteration 16250: error is 2.144725 (50 iterations in 200.60 seconds)
Iteration 16300: error is 2.144704 (50 iterations in 203.09 seconds)
Iteration 16350: error is 2.144669 (50 iterations in 198.29 seconds)
Iteration 16400: error is 2.144651 (50 iterations in 203.34 seconds)
Iteration 16450: error is 2.144628 (50 iterations in 199.26 seconds)
Iteration 16500: error is 2.144622 (50 iterations in 201.96 seconds)
Iteration 16550: error is 2.144592 (50 iterations in 199.28 seconds)
Iteration 16600: error is 2.144558 (50 iterations in 203.12 seconds)
Iteration 16650: error is 2.144545 (50 iterations in 198.73 seconds)
Iteration 16700: error is 2.144514 (50 iterations in 204.88 seconds)
Iteration 16750: error is 2.144501 (50 iterations in 212.55 seconds)
Iteration 16800: error is 2.144479 (50 iterations in 205.25 seconds)
Iteration 16850: error is 2.144458 (50 iterations in 201.89 seconds)
Iteration 16900: error is 2.144428 (50 iterations in 200.30 seconds)
Iteration 16950: error is 2.144408 (50 iterations in 199.31 seconds)
Iteration 17000: error is 2.144396 (50 iterations in 201.54 seconds)
Iteration 17050: error is 2.144348 (50 iterations in 198.80 seconds)
Iteration 17100: error is 2.144337 (50 iterations in 200.07 seconds)
Iteration 17150: error is 2.144316 (50 iterations in 199.54 seconds)
Iteration 17200: error is 2.144299 (50 iterations in 200.13 seconds)
Iteration 17250: error is 2.144272 (50 iterations in 199.28 seconds)
Iteration 17300: error is 2.144259 (50 iterations in 199.25 seconds)
Iteration 17350: error is 2.144230 (50 iterations in 203.63 seconds)
Iteration 17400: error is 2.144222 (50 iterations in 201.21 seconds)
Iteration 17450: error is 2.144183 (50 iterations in 199.41 seconds)
Iteration 17500: error is 2.144149 (50 iterations in 201.91 seconds)
Iteration 17550: error is 2.144119 (50 iterations in 201.33 seconds)
Iteration 17600: error is 2.144084 (50 iterations in 202.73 seconds)
Iteration 17650: error is 2.144058 (50 iterations in 201.30 seconds)
Iteration 17700: error is 2.144030 (50 iterations in 198.62 seconds)
Iteration 17750: error is 2.144004 (50 iterations in 201.22 seconds)
Iteration 17800: error is 2.143970 (50 iterations in 202.78 seconds)
Iteration 17850: error is 2.143947 (50 iterations in 200.96 seconds)
Iteration 17900: error is 2.143926 (50 iterations in 199.20 seconds)
Iteration 17950: error is 2.143874 (50 iterations in 200.90 seconds)
Iteration 18000: error is 2.143842 (50 iterations in 200.03 seconds)
Iteration 18050: error is 2.143817 (50 iterations in 201.61 seconds)
Iteration 18100: error is 2.143799 (50 iterations in 199.64 seconds)
Iteration 18150: error is 2.143766 (50 iterations in 202.42 seconds)
Iteration 18200: error is 2.143740 (50 iterations in 199.12 seconds)
Iteration 18250: error is 2.143719 (50 iterations in 201.11 seconds)
Iteration 18300: error is 2.143682 (50 iterations in 199.44 seconds)
Iteration 18350: error is 2.143656 (50 iterations in 200.30 seconds)
Iteration 18400: error is 2.143617 (50 iterations in 198.68 seconds)
Iteration 18450: error is 2.143578 (50 iterations in 201.75 seconds)
Iteration 18500: error is 2.143548 (50 iterations in 197.73 seconds)
Iteration 18550: error is 2.143533 (50 iterations in 199.99 seconds)
Iteration 18600: error is 2.143497 (50 iterations in 199.77 seconds)
Iteration 18650: error is 2.143472 (50 iterations in 205.53 seconds)
Iteration 18700: error is 2.143445 (50 iterations in 199.92 seconds)
Iteration 18750: error is 2.143430 (50 iterations in 201.34 seconds)
Iteration 18800: error is 2.143386 (50 iterations in 207.71 seconds)
Iteration 18850: error is 2.143370 (50 iterations in 202.96 seconds)
Iteration 18900: error is 2.143348 (50 iterations in 204.34 seconds)
Iteration 18950: error is 2.143327 (50 iterations in 208.82 seconds)
Iteration 19000: error is 2.143286 (50 iterations in 212.89 seconds)
Iteration 19050: error is 2.143271 (50 iterations in 199.39 seconds)
Iteration 19100: error is 2.143245 (50 iterations in 198.83 seconds)
Iteration 19150: error is 2.143220 (50 iterations in 199.79 seconds)
Iteration 19200: error is 2.143181 (50 iterations in 198.38 seconds)
Iteration 19250: error is 2.143168 (50 iterations in 198.68 seconds)
Iteration 19300: error is 2.143136 (50 iterations in 202.16 seconds)
Iteration 19350: error is 2.143109 (50 iterations in 200.08 seconds)
Iteration 19400: error is 2.143098 (50 iterations in 203.44 seconds)
Iteration 19450: error is 2.143062 (50 iterations in 199.68 seconds)
Iteration 19500: error is 2.143044 (50 iterations in 202.67 seconds)
Iteration 19550: error is 2.143009 (50 iterations in 199.23 seconds)
Iteration 19600: error is 2.142978 (50 iterations in 201.96 seconds)
Iteration 19650: error is 2.142959 (50 iterations in 198.08 seconds)
Iteration 19700: error is 2.142925 (50 iterations in 201.34 seconds)
Iteration 19750: error is 2.142888 (50 iterations in 202.27 seconds)
Iteration 19800: error is 2.142872 (50 iterations in 201.23 seconds)
Iteration 19850: error is 2.142850 (50 iterations in 201.13 seconds)
Iteration 19900: error is 2.142820 (50 iterations in 202.13 seconds)
Iteration 19950: error is 2.142796 (50 iterations in 199.68 seconds)
Iteration 20000: error is 2.142778 (50 iterations in 202.25 seconds)
Fitting performed in 81196.80 seconds.
str(tsne)
List of 14
 $ N                  : int 131072
 $ Y                  : num [1:131072, 1:2] 11.5 11.2 11.2 19.7 10.9 ...
 $ costs              : num [1:131072] 1.07e-05 1.12e-05 1.12e-05 1.55e-05 1.20e-05 ...
 $ itercosts          : num [1:400] 82.8 82.8 82.8 82.8 82.8 ...
 $ origD              : int 50
 $ perplexity         : num 1500
 $ theta              : num 0.5
 $ max_iter           : num 20000
 $ stop_lying_iter    : int 250
 $ mom_switch_iter    : int 250
 $ momentum           : num 0.5
 $ final_momentum     : num 0.8
 $ eta                : num 200
 $ exaggeration_factor: num 12
save.image()

Let’s now visualize the results. Let’s build the data.frame first.

AT_richness = all_DNAbins %>%
  split(1:nrow(.)) %>%
  furrr::future_map_dfr(~ kmer::kcount(.x$forward,k = 1) %>% as_tibble,
                        .options = furrr_options(seed = TRUE)) %>%
  mutate(ID=1:n()) %>%
  gather(key=base,value=count,-ID) %>%
  group_by(ID) %>%
  summarise(AT = sum(ifelse(base %in% c('A','T'), count, 0)/kmer_len)) %>%
  arrange(ID)
tsne_res = tsne$Y %>%
  as.data.frame() %>%
  mutate(r = sqrt(V1^2 + V2^2),
         t = atan2(V2, V1)) %>%
  bind_cols(select(AT_richness, AT))

tsne_res$canonical = purrr::map(all_DNAbins$forward, 
                                ~as.character(.x) %>%
                                  str_to_upper %>%
                                  str_c(collapse='')) %>% unlist

tsne_res$rc = purrr::map(all_DNAbins$rc, 
                                ~as.character(.x) %>%
                                  str_to_upper %>%
                                  str_c(collapse='')) %>% unlist

tsne_res$complexity = acss(tsne_res$canonical, 4)[,1]

tsne_res
NA

Let’s now visualize what tSNE did. It clearly separated kmers by their AT-richness, along bot X and Y axes. There is also some clustering by entropy.


ggplot(tsne_res) +
  geom_point(aes(x=V1, y=V2,color=complexity)) +
  scale_color_viridis_c()



ggplot(tsne_res) +
  geom_point(aes(x=V1, y=V2,color=AT)) +
  scale_color_viridis_c()

Now let’s rotate so that more AT-rich kmers are on the left. Let’s figure out the average angle between most AT-rich and CG-rich kmers.

rotation = tsne_res %>%
  filter(AT == 1) %>%
  pull(t) %>%
  mean

tsne_res = tsne_res %>%
  mutate(V1 = r*cos(t - rotation + pi),
         V2 = r*sin(t - rotation + pi),
         r = sqrt(V1^2 + V2^2),
         t = atan2(V2, V1))
  

Now let’s make sure that complexity is roughly increasing along the Y axis, and flip if not the case.

if (cor(tsne_res$V2,tsne_res$complexity) < 0){
  tsne_res = tsne_res %>%
  mutate(V2 = -V2,
         r = sqrt(V1^2 + V2^2),
         t = atan2(V2, V1))
}

Now we have a kmer distribution that follows roughly AT richness left to right and complexity bottom to top:

ggplot(tsne_res) +
  geom_point(aes(x=V1, y=V2,color=complexity)) +
  scale_color_viridis_c() +
  coord_equal()



ggplot(tsne_res) +
  geom_point(aes(x=V1, y=V2,color=AT)) +
  scale_color_viridis_c() +
  coord_equal()

#Save

We will now save the tsne table so we can use to produce a grid for the final figures.

write_csv(tsne_res,paste0('tsne_',kmer_len,'_result.csv'))
LS0tCnRpdGxlOiAiTWFwcGluZyBrbWVycyB0byBhbiBpbWFnZSIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKSW4gdGhpcyBub3RlYm9vayB3ZSB3aWxsIG1hcCBrbWVycyB0byBhIHNxdWFyZSBncmlkLiAKCkxhdGVyIHRoaXMgd2lsbCBiZSB1c2VkIHRvIG1hcCBrbWVyIGNvdW50cyBpbiB0aGlzIGdyaWQsIGdlbmVyYXRpbmcgaW1hZ2VzIHRoYXQgY2FuIGJlIHVzZWQgZm9yIHRyYWluaW5nIGEgbmV1cmFsIG5ldHdvcmsuCgpXZSB3aWxsIHVzZSB0U05FIHRvIHBsYWNlIGttZXJzIGluIGEgMkQgc3BhY2Ugc28gdGhhdCBtb3JlIHNpbWlsYXIga21lcnMgYXJlIGNsb3NlIHRvIGVhY2ggb3RoZXIuIFRvIGlkZW50aWZ5IGVhY2gga21lciB1bmlxdWVseSwgd2Ugd2lsbCBjb3VudCwgZm9yIGVhY2ggb25lLCBhbGwgc3ViLWttZXJzIHdpdGggcHJpbWVyIG51bWJlciBsZW5ndGgsIHVwIHRvIHRoZSBmaXJzdCBzdWNoIGttZXIgd2l0aCBsZW5naHQgYWJvdmUgaGFsZiBvZiBvdXIgZm9jYWwgc2VxdWVuY2UuIFRoZXNlIGNvdW50cyB1bmlxdWVseSBpZGVudGlmeSBlYWNoIGttZXIsIGFuZCB3ZSB3aWxsIHVzZSB0aGVzZSBhcyBwcm9wZXJ0aWVzIGluIHRTTkUgdG8gbWFwIHRoZW0uIAoKRmluYWxseSwgd2Ugd2lsbCByb3RhdGUgdFNORSByZXN1bHRzIHNvIHRoYXQgdGhlIGF4aXMgb2YgQVQgcmljaG5lc3MgZ29lcyBsZWZ0IHRvIHJpZ2h0LiBUaGlzIHNob3VsZCBub3QgYWZmZWN0IGEgbmV1cmFsIG5ldHdvcmsgYnV0IGlzIG5pY2UgZm9yIHdlIGh1bWFucyB0byB2aXN1YWxpemUuCgpXZSB3aWxsIHRoZW4gZml0IHRoZXNlIGNvb3JkaW5hdGVzIHRvIGEgZ3JpZCBzbyB0aGF0IGVhY2gga21lciBvY2N1cGllcyBvbmx5IG9uZSBncmlkIGNlbGwuCgpXZSBmaW5hbGx5IGV4cG9ydCB0aGlzIGdyaWQgaW5mb3JtYXRpb24gdG8gdXNlIGluIHB5dGhvbiB0byBnZW5lcmF0ZSBmaWd1cmVzIGZyb20ga21lciBjb3VudHMuCgpMZXQncyBzdGFydCBieSBzZXR0aW5nIHRoZSBrbWVyIGxlbmd0aCBhbmQgbnVtYmVyIG9mIGF2YWlsYWJsZSBjb3JlcyBmb3IgY29tcHV0YXRpb24uCgpgYGB7cn0Kcm0obGlzdCA9IGxzKCkpCmttZXJfbGVuID0gOQpuY29yZXMgPSAzMgpgYGAKCgoKRmlyc3QsIHJlYWQgcGFja2FnZXMgYW5kIHNldHVwIHBhcmFsbGVsIGNvbXB1dGluZzoKYGBge3J9CmxpYnJhcnkodGlkeXZlcnNlKQpsaWJyYXJ5KGttZXIpCmxpYnJhcnkoUnRzbmUpCmxpYnJhcnkoZnVycnIpCmxpYnJhcnkoYWNzcykKcGxhbihtdWx0aXNlc3Npb24sIHdvcmtlcnMgPSBuY29yZXMpCmBgYAoKIyBMaXN0aW5nIGNhbm9uaWNhbCBrbWVycwoKTGV0J3Mgc3RhcnQgYnkgbGlzdGluZyBhbGwgcG9zc2libGUga21lcnMuCmBgYHtyfQphbGxfa21lcnMgPSBleHBhbmQuZ3JpZChyZXAobGlzdCgxOjQpLGttZXJfbGVuKSkKYGBgCgpMZXQncyBub3cgcmVkdWNlIHRoaXMgbGlzdCBvbmx5IHRvIHRoZSBjYW5vbmljYWwga21lcnMuIEZvciB0aGF0LCBsZXQncyBjcmVhdGUgYSBmdW5jdGlvbiB0aGF0IHJldHJpZXZlcyB0aGUgcmV2ZXJzZSBjb21wbGVtZW50CmBgYHtyfQpyZXZjb21wID0gZnVuY3Rpb24odmVjKXsKICBjb21wX2RpY3QgPSA0OjEKICBuYW1lcyhjb21wX2RpY3QpID0gMTo0CiAgCiAgdmVjID0gcmV2KHZlYykKICB2ZWMgPSBjb21wX2RpY3RbdmVjXQogIAogIG5hbWVzKHZlYykgPSBOVUxMCiAgCiAgcmV0dXJuKHZlYykKfQpgYGAKCk5vdywgbGV0J3MgY3JlYXRlIGEgZnVuY3Rpb24gdGhhdCByZXR1cm5zIHRoZSBjYW5vbmljYWwgdmVyc2lvbiBvZiBhbnkga21lcjoKCmBgYHtyfQpnZXRfY2Fub25pY2FsPSBmdW5jdGlvbih2ZWMpewogIHN0cl92ZWMgPSBzdHJfYyh2ZWMsY29sbGFwc2U9JycpCiAgc3RyX3JldiA9IHN0cl9jKHJldmNvbXAodmVjKSxjb2xsYXBzZT0nJykKICAKICBjYW5vbmljYWwgPSBzb3J0KGMoc3RyX3ZlYyxzdHJfcmV2KSlbMV0KICAKICByZXR1cm4odW5saXN0KHN0cl9zcGxpdChjYW5vbmljYWwsJycpKSkKfQpgYGAKCkZpbmFsbHksIGxldCdzIGFwcGx5IHRoaXMgdG8gdGhlIG1hdHJpeCB3aXRoIGFsbCBrbWVycyBhbmQgcmVtb3ZlIGR1cGxpY2F0ZXM6CmBgYHtyfQphbGxfY2Fub25pY2FsID0gYXBwbHkoYWxsX2ttZXJzLDEsZ2V0X2Nhbm9uaWNhbCkgJT4lCiAgdCgpICU+JQogIGFzLmRhdGEuZnJhbWUoKSAlPiUKICBkaXN0aW5jdCgpCgphbGxfY2Fub25pY2FsCmBgYAoKIyBDb3VudGluZyBzdWIta21lcnMKCldlIG5lZWQgdG8gc2NvcmUgZWFjaCBrbWVyIGZvciBhIGJ1bmNoIG9mIHByb3BlcnRpZXMsIHNvIHdlIGNhbiB1c2UgdFNORSBsYXRlciB0byBvcmRlciB0aGVtIGluIDJELiBIZXJlIHdlIHdpbGwgdXNlIGNvdW50cyBvZiBjb250YWluZWQgMm1lcnMsIDNtZXJzIGFuZCA1bWVycyBhbmQgc28gb24sIHVudGlsIHdlIHJlYWNoIGEgc2l6ZSB0aGF0IGlzIG1vcmUgdGhhbiBoYWxmIG9mIHRoZSBvcmlnaW5hbCBrbWVyLiBXaXRoIHRoaXMsIGNvdW50cyBmb3IgZWFjaCBrbWVyIHdpbGwgYmUgdW5pcXVlLgpUaGUgZmFzdGVzdCB3YXkgdG8gZG8gdGhhdCBpcyB0byB0cmFuc2Zvcm0gZXZlcnl0aGluZyB0byBETkFiaW4gZm9ybWF0IGZpcnN0LgoKYGBge3J9CgoKdG9fRE5BYmluID0gZnVuY3Rpb24oeCkgewogIHRvX2NoYXIgPSBjKCdBJywnQycsJ0cnLCdUJykKICBhcGU6OmFzLkROQWJpbihzdHJfYyh0b19jaGFyW2FzLmludGVnZXIodW5saXN0KHgpKV0pKQp9CgoKYWxsX0ROQWJpbnMgPSBhbGxfY2Fub25pY2FsICU+JQogIHNwbGl0KGYgPSAxOm5yb3coLikpICU+JQogIGZ1cnJyOjo6ZnV0dXJlX21hcCh0b19ETkFiaW4pICU+JQogIHRpYmJsZShmb3J3YXJkID0gLikgJT4lCiAgcm93d2lzZSAlPiUKICBtdXRhdGUocmMgPSBsaXN0KGFwZTo6Y29tcGxlbWVudChmb3J3YXJkKSkpICU+JQogIHVuZ3JvdXAKICAKYWxsX0ROQWJpbnMgCgpgYGAKCk5vdyBsZXQncyBsaXN0IGFsbCBzaXplcyBmb3Igc3ViLWttZXJzIHRoYXQgd2Ugd2lsbCBjb3VudDoKCmBgYHtyfQpzdWJfbGVucyA9IDIKcGFzdF9oYWxmID0gRkFMU0UKCmZvciAoaSBpbiAzOihrbWVyX2xlbikpewogIGlmIChpID4ga21lcl9sZW4vMil7cGFzdF9oYWxmID0gVFJVRX0KICAKICBpZiAoYWxsKGFzLmxvZ2ljYWwoaSAlJSBzdWJfbGVucykpKXsKICAgIHN1Yl9sZW5zID0gYyhzdWJfbGVucyxpKQogICAgaWYgKHBhc3RfaGFsZikge2JyZWFrfQogIH0KICAKfQoKc3ViX2xlbnMKYGBgCgoKTm93IHdlIGNhbiBjb3VudCBzdWIta21lcnMuIEZvciBlYWNoIGNhbm9uaWNhbCBrbWVyLCB3ZSB3aWxsIGF2ZXJhZ2UgY291bnRzIHdpdGggaXRzIHJldmVyc2UgY29tcGxlbWVudC4KCmBgYHtyfQpzdWJjb3VudHMgPSBsaXN0KCkKCmZvciAoc3VibCBpbiBzdWJfbGVucyl7CiAgc3ViY291bnRzID0gYXBwZW5kKHN1YmNvdW50cywgbGlzdChhbGxfRE5BYmlucyAlPiUKICBzcGxpdCgxOm5yb3coLikpICU+JQogIGZ1cnJyOjpmdXR1cmVfbWFwX2Rmcih+IGttZXI6Omtjb3VudCgueCRmb3J3YXJkLGsgPSBzdWJsKSAlPiUgYXNfdGliYmxlICU+JQogICAgICAgICAgICAgICAgICBiaW5kX3Jvd3Moa21lcjo6a2NvdW50KC54JHJjLCBrID0gc3VibCkgJT4lIGFzX3RpYmJsZSkgJT4lCiAgICAgICAgICAgICAgICAgIHN1bW1hcmlzZV9hbGwobWVhbiksIC5vcHRpb25zID0gZnVycnJfb3B0aW9ucyhzZWVkID0gVFJVRSkpKQogICkKfQoKCnN1YmNvdW50cwpzYXZlLmltYWdlKCkKYGBgCgoKCiMgdFNORQoKTGV0J3Mgbm93IGRvIHRoZSBvcmRpbmF0aW9uIHdpdGggdC1TTkUuIFdlIHdpbGwgYmluZCBjb3VudHMgb2Ygc3ViLWttZXJzIGFuZCBub3JtYWxpemUgY291bnRzLiAKV2Ugd2lsbCB1c2UgYSBoaWdoZXIgcGVycGxleGl0eSB0aGFuIHRoZSBkZWZhdWx0IHNpbmNlIGl0IHNlZW1zIHRvIHNwcmVhZCBwb2ludHMgYmV0dGVyLiBXZSB3aWxsIGFsc28gaW5jcmVhc2UgdGhlIG51bWJlciBvZiBpdGVyYXRpb25zIHRvIG1ha2Ugc3VyZSB0aGUgY29uZmlndXJhdGlvbiBjb252ZXJnZXMuCgpgYGB7cn0Kbm9ybWFsaXplZCA9IGRvLmNhbGwoYmluZF9jb2xzLCBzdWJjb3VudHMpICU+JQogIGFzLm1hdHJpeCAlPiUKICBSdHNuZTo6bm9ybWFsaXplX2lucHV0KCkKCnRzbmUgPSBSdHNuZShub3JtYWxpemVkLAogICAgICAgICAgICAgZGltcz0yLAogICAgICAgICAgICAgdmVyYm9zZSA9IFQsIAogICAgICAgICAgICAgbm9ybWFsaXplID0gRiwKICAgICAgICAgICAgIG51bV90aHJlYWRzPW5jb3JlcywgCiAgICAgICAgICAgICBwZXJwbGV4aXR5PW1pbigxNTAwLG5yb3cobm9ybWFsaXplZCkvMy0xKSwKICAgICAgICAgICAgIG1heF9pdGVyID0gMjAwMDApCgpzdHIodHNuZSkKCnNhdmUuaW1hZ2UoKQpgYGAKCkxldCdzIG5vdyB2aXN1YWxpemUgdGhlIHJlc3VsdHMuIExldCdzIGJ1aWxkIHRoZSBkYXRhLmZyYW1lIGZpcnN0LgoKYGBge3J9CkFUX3JpY2huZXNzID0gYWxsX0ROQWJpbnMgJT4lCiAgc3BsaXQoMTpucm93KC4pKSAlPiUKICBmdXJycjo6ZnV0dXJlX21hcF9kZnIofiBrbWVyOjprY291bnQoLngkZm9yd2FyZCxrID0gMSkgJT4lIGFzX3RpYmJsZSwKICAgICAgICAgICAgICAgICAgICAgICAgLm9wdGlvbnMgPSBmdXJycl9vcHRpb25zKHNlZWQgPSBUUlVFKSkgJT4lCiAgbXV0YXRlKElEPTE6bigpKSAlPiUKICBnYXRoZXIoa2V5PWJhc2UsdmFsdWU9Y291bnQsLUlEKSAlPiUKICBncm91cF9ieShJRCkgJT4lCiAgc3VtbWFyaXNlKEFUID0gc3VtKGlmZWxzZShiYXNlICVpbiUgYygnQScsJ1QnKSwgY291bnQsIDApL2ttZXJfbGVuKSkgJT4lCiAgYXJyYW5nZShJRCkKCnRzbmVfcmVzID0gdHNuZSRZICU+JQogIGFzLmRhdGEuZnJhbWUoKSAlPiUKICBtdXRhdGUociA9IHNxcnQoVjFeMiArIFYyXjIpLAogICAgICAgICB0ID0gYXRhbjIoVjIsIFYxKSkgJT4lCiAgYmluZF9jb2xzKHNlbGVjdChBVF9yaWNobmVzcywgQVQpKQoKdHNuZV9yZXMkY2Fub25pY2FsID0gcHVycnI6Om1hcChhbGxfRE5BYmlucyRmb3J3YXJkLCAKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB+YXMuY2hhcmFjdGVyKC54KSAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHN0cl90b191cHBlciAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHN0cl9jKGNvbGxhcHNlPScnKSkgJT4lIHVubGlzdAoKdHNuZV9yZXMkcmMgPSBwdXJycjo6bWFwKGFsbF9ETkFiaW5zJHJjLCAKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB+YXMuY2hhcmFjdGVyKC54KSAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHN0cl90b191cHBlciAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHN0cl9jKGNvbGxhcHNlPScnKSkgJT4lIHVubGlzdAoKdHNuZV9yZXMkY29tcGxleGl0eSA9IGFjc3ModHNuZV9yZXMkY2Fub25pY2FsLCA0KVssMV0KCnRzbmVfcmVzCgpgYGAKCgpMZXQncyBub3cgdmlzdWFsaXplIHdoYXQgdFNORSBkaWQuIEl0IGNsZWFybHkgc2VwYXJhdGVkIGttZXJzIGJ5IHRoZWlyIEFULXJpY2huZXNzLCBhbG9uZyBib3QgWCBhbmQgWSBheGVzLiBUaGVyZSBpcyBhbHNvIHNvbWUgY2x1c3RlcmluZyBieSBlbnRyb3B5LgpgYGB7cn0KCmdncGxvdCh0c25lX3JlcykgKwogIGdlb21fcG9pbnQoYWVzKHg9VjEsIHk9VjIsY29sb3I9Y29tcGxleGl0eSkpICsKICBzY2FsZV9jb2xvcl92aXJpZGlzX2MoKQoKCmdncGxvdCh0c25lX3JlcykgKwogIGdlb21fcG9pbnQoYWVzKHg9VjEsIHk9VjIsY29sb3I9QVQpKSArCiAgc2NhbGVfY29sb3JfdmlyaWRpc19jKCkKCmBgYApOb3cgbGV0J3Mgcm90YXRlIHNvIHRoYXQgbW9yZSBBVC1yaWNoIGttZXJzIGFyZSBvbiB0aGUgbGVmdC4gTGV0J3MgZmlndXJlIG91dCB0aGUgYXZlcmFnZSBhbmdsZSBiZXR3ZWVuIG1vc3QgQVQtcmljaCBhbmQgQ0ctcmljaCBrbWVycy4KCmBgYHtyfQpyb3RhdGlvbiA9IHRzbmVfcmVzICU+JQogIGZpbHRlcihBVCA9PSAxKSAlPiUKICBwdWxsKHQpICU+JQogIG1lYW4KCnRzbmVfcmVzID0gdHNuZV9yZXMgJT4lCiAgbXV0YXRlKFYxID0gcipjb3ModCAtIHJvdGF0aW9uICsgcGkpLAogICAgICAgICBWMiA9IHIqc2luKHQgLSByb3RhdGlvbiArIHBpKSwKICAgICAgICAgciA9IHNxcnQoVjFeMiArIFYyXjIpLAogICAgICAgICB0ID0gYXRhbjIoVjIsIFYxKSkKICAKYGBgCgpOb3cgbGV0J3MgbWFrZSBzdXJlIHRoYXQgY29tcGxleGl0eSBpcyByb3VnaGx5IGluY3JlYXNpbmcgYWxvbmcgdGhlIFkgYXhpcywgYW5kIGZsaXAgaWYgbm90IHRoZSBjYXNlLgoKYGBge3J9CmlmIChjb3IodHNuZV9yZXMkVjIsdHNuZV9yZXMkY29tcGxleGl0eSkgPCAwKXsKICB0c25lX3JlcyA9IHRzbmVfcmVzICU+JQogIG11dGF0ZShWMiA9IC1WMiwKICAgICAgICAgciA9IHNxcnQoVjFeMiArIFYyXjIpLAogICAgICAgICB0ID0gYXRhbjIoVjIsIFYxKSkKfQpgYGAKCgpOb3cgd2UgaGF2ZSBhIGttZXIgZGlzdHJpYnV0aW9uIHRoYXQgZm9sbG93cyByb3VnaGx5IEFUIHJpY2huZXNzIGxlZnQgdG8gcmlnaHQgYW5kIGNvbXBsZXhpdHkgYm90dG9tIHRvIHRvcDoKYGBge3J9CmdncGxvdCh0c25lX3JlcykgKwogIGdlb21fcG9pbnQoYWVzKHg9VjEsIHk9VjIsY29sb3I9Y29tcGxleGl0eSkpICsKICBzY2FsZV9jb2xvcl92aXJpZGlzX2MoKSArCiAgY29vcmRfZXF1YWwoKQoKCmdncGxvdCh0c25lX3JlcykgKwogIGdlb21fcG9pbnQoYWVzKHg9VjEsIHk9VjIsY29sb3I9QVQpKSArCiAgc2NhbGVfY29sb3JfdmlyaWRpc19jKCkgKwogIGNvb3JkX2VxdWFsKCkKYGBgCgoKCiNTYXZlCgpXZSB3aWxsIG5vdyBzYXZlIHRoZSB0c25lIHRhYmxlIHNvIHdlIGNhbiB1c2UgdG8gcHJvZHVjZSBhIGdyaWQgZm9yIHRoZSBmaW5hbCBmaWd1cmVzLiAKCmBgYHtyfQp3cml0ZV9jc3YodHNuZV9yZXMscGFzdGUwKCd0c25lXycsa21lcl9sZW4sJ19yZXN1bHQuY3N2JykpCmBgYAoKCg==